Search for the standard model Higgs boson decaying to a bb pair in events with no 
charged leptons and large missing transverse energy using the full CDF data set 
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We report on a search for the standard model Higgs boson produced in association with a vector 
boson in the full data set of proton-antiproton collisions at y/s = 1.96 TeV recorded by the CDF II 
detector at the Tevatron, corresponding to an integrated luminosity of 9.45 fb _1 . We consider events 
having no identified charged lepton, a transverse energy imbalance, and two or three jets, of which 
at least one is consistent with originating from the decay of a b quark. We place 95% credibility level 
upper limits on the production cross section times standard model branching fraction for several 
mass hypotheses between 90 and 150 GeV/c 2 . For a Higgs boson mass of 125 GeV/c 2 , the observed 
(expected) limit is 6.7 (3.6) times the standard model prediction. 
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Iii the standard model (SM) the mechanism re- 
sponsible for spontaneous electroweak symmetry break- 
ing gives mass to the W and Z bosons [2j • The Higgs bo- 
son (H) represents the remaining degree of freedom after 
the symmetry is broken and also allows fermions to ac- 
quire mass through Yukawa couplings. The SM does not 
predict the mass of the Higgs boson, mjj, but the combi- 
nation of precision electroweak measurements [3j , includ- 
ing recent top quark and W boson mass measurements 
from the Tevatron [H, [H, constrains uih < 152 GeV/c 2 
at the 95% confidence level. Direct searches at LEP2 [f|, 
the Tevatron Q, and the LHC [1] exclude all possible 
masses of the SM Higgs boson at the 95% confidence 
level or the 95% credibility level (C.L.), except within the 
ranges 116.6 - 119.4 GeV/c 2 and 122.1 - 127 GeV/c 2 . 
A SM Higgs boson in this mass range would be pro- 
duced in the ^/s = 1.96 TeV pp collisions of the Tevatron, 
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and would have a branching fraction to bb greater than 
50% inni. In these currently allowed regions, H — > bb 
is the dominant decay mode, but large QCD multi-jet 
backgrounds overwhelm searches in the exclusive bb final 
state. Searches for H produced in association with a vec- 
tor boson, VH (V = W or Z), where the vector boson 
decays leptonically, access final states with significantly 
higher signal to background ratios than those resulting 
from gg — >• H — ► bb. 

This Letter presents a search for VH production in 
events with missing transverse energy (J£t) 12j ~~ a sig- 
nature of neutrinos escaping detection - and b jets in 
a data set corresponding to an integrated luminosity of 
9.45 fb _1 collected using the CDF II detector at the Fer- 
milab Tevatron. This analysis considers Z(— > vv)H pro- 
duction, where the neutrinos (y) escape detection, or 
Z{— > £ + £~)H when neither charged lepton is identified 
or they are reconstructed as jets. We are also sensitive to 
WH events where W — > ev or W — > tv and the charged 
lepton is reconstructed as a jet, or where it is not identi- 
fied. By building upon techniques used for the observa- 
tion of single-top-quark production [l3} , we significantly 
increase the signal acceptance with respect to previous 
Tevatron searches in this final state 

BUI- 
CDF II is a multi-purpose collider detector described in 
Ref. . A three- level online selection system (trigger) is 
used to select events for analysis. Events are selected via 
boolean OR of two trigger paths .17] requiring either the 
presence of large Et, or large Et and two jets [lil ]. The 
efficiency associated to this selection is obtained from 
data and is applied to the Monte-Carlo (MC) simulated 
samples to reproduce the inefficiencies present in the 
data. The parametrization of the trigger efficiency [l9| 
significantly improves the modeling of the trigger turn- 
on outside the fully efficient region, as verified using data 
control samples. This allows significantly relaxed pre- 
selection requirements compared to that of Ref. [l4| • The 
parametrization is done using a neural network (NN) [20j | 
trained from the following inputs: the $t in the event, its 
azimuth (<£>(^r)), three variables characterizing the ith 
jet (ji) in the event - Eriji), viji) and (p(ji) - and the r\-<\> 
separation of the jets AR = A(p 2 + Arf [12}. We thus 
have 9 (14) input variables for events with two (three) 
jets. We use a muon-triggered sample to define the nom- 
inal parametrization and derive the trigger systematic 
uncertainty from a parametrization of an inclusive jet 
sample with at least one jet with Et > 50 GeV. The ef- 
ficiency ranges from 0.40 for events having $t = 35 GeV 
to 1.0 for events with E T > 80 GeV. 

We reconstruct jets from energy depositions in the 
calorimeter towers using a jet clustering cone algo- 
rithm [IH with a cone of radius AR = 0.4. In addition 
to the standard jet-energy corrections used by CDF fljj . 
we adjust the energy of the jets according to the mea- 
sured momentum of the charged particle tracks within 
the jet cone 21] . We further improve the energy deter- 



mination using a NN approach to estimate the energies 
of the initiating quarks. The direction and magnitude of 
the $t are then recomputed. These jet reconstruction 
methods improve both the signal acceptance and the rel- 
ative resolution of the reconstructed invariant mass of the 
Higgs boson candidate by ~ 15%. We reject events with 
an identified e or /i to maintain statistical independence 
from other CDF analyses searching for the SM Higgs bo- 
son (mm. 

After the events are reconstructed, the following 
pre-selection requirements are made: we select events 
with IfiT > 35 GeV and two or three jets satisfying 
Et > 15 GeV and |ry| < 2.4, thus accepting events where 
partons provide an additional jet candidate, or a lepton 
(e or t) is reconstructed as a jet. The two most energetic 
jets, j\ and j'2, are required to have reconstructed trans- 
verse energies of at least 25 and 20 GeV respectively, 
satisfy \rj(ji)\ < 2, be separated by Ai?(ji, j 2 ) > 0.8, and 
at least one of these two jets must satisfy < 0.9. 

This selection is relaxed with respect to Ref. [14|, and 
increases the signal acceptance by a factor of 1.4. The 
cost of this increased signal acceptance is a tenfold in- 
crease of the background acceptance. One of the leading 
sources of significant I£t in QCD (quantum chromody- 
namics) production of multi-jet events (QCD MJ) arises 
from the mismeasurement of jet energies. Neutrinos from 
semileptonic b decays can also produce significant $t in 
QCD MJ events. In both of those cases, the $t is often 
aligned with Ei^ , and such events are rejected by requir- 
ing Ay{$ T ,E%) > 1.5 and Atp{$ T ,E^' 3 ) > 0.4. This 
reduces the backgrounds by a factor of three, while re- 
taining 90% of the signal. The large backgrounds from 
light-flavor jet production originating from u, d, or s 
quarks or gluons are reduced by identifying (tagging) jets 
consistent with the decay of b quarks; c quarks are not 
explicitly identified. 

We use two algorithms to tag &-quark jets, 
SECVTX 24], which attempts to reconstruct the sec- 
ondary vertex from the b decay (displaced from the in- 
teraction point because hadrons containing b or c quarks 
can travel a few millimeters in the detector before de- 
caying), and JETPROB 25], which determines for each 
jet the probability that the tracks within the jet are con- 
sistent with originating from the primary vertex. We 
operate SECVTX (JETPROB) at about 40% (50%) effi- 
ciency, yielding a rate of light-flavor jets mistakenly iden- 
tified as b jets (mistags) of about 1% (5%). We exploit 
the different purities of the selected multi-tagged events 
by considering independent tagging categories separately 
and later combining results. We require that one of 
the leading jets be tagged by SECVTX and the other 
be tagged either by SECVTX (SS) or JETPROB (SJ), 
or be untagged (IS). The tagging process reduces the 
backgrounds by two orders of magnitude while retain- 
ing about 50% of the signal. Events satisfying the afore- 
mentioned criteria comprise the preselection sample. The 
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signal-to-background ratio (S/B) in this sample is esti- 
mated to be S/B ~ 1/400 in the SS tagging category for 
mji = 125 GeV/c 2 , compared to less than 10~ 5 for the 
full sample of triggered events. The relative fraction of 
events with Z — > vu, Z — > £ + £~ , and W — > iv is respec- 
tively 47, 3 and 50 percent; of the latter, the fraction with 
electron (e), muon (/x), and tau (r) decays is respectively 
30, 20 and 50 percent. 

Backgrounds from top-quark events via pair and elec- 
troweak production (top), V+jets events, and diboson 
events ( VV) are all modeled via simulation. The ALPGEN 
generator [26( is used to estimate V+jets (including the 
ratio of light- to heavy-flavor events), powheg [27j for 
electroweak production of top quarks, and pythia [28| 
for top quark pair production and VV events, as well as 
for the VH signal. The parton showering is performed 
by pythia. The event generation process includes a sim- 
ulation of the detector response |2J|, and the resulting 
samples are subjected to the same reconstruction and 
analysis chain as the data. The normalization of the 
simulated samples is described in Ref. 0. Electroweak 
(EWK) mistags, events with light-flavor jets that are 
wrongly tagged, are mostly due to V+jets and are deter- 
mined from light-flavor simulated samples weighted by a 
per-event mistag probability, obtained for each algorithm 
from an orthogonal data sample 24 125|. 

The background contribution from QCD MJ events is 
difficult to describe accurately with the simulation, and 
so is modeled separately from an independent data sam- 
ple. We predict the QCD MJ contribution from data 
events with Ay{$ T , E^) < 0.4 and 35 < E T < 70 GeV. 
In this sample, we measure the contamination from 
events with heavy-flavor jets or light-flavor mistags that 
fall into one of the three previously described tagging 
categories (l9| . 

Following Ref. [3], we parametrize this category- 
tagging rate (the ratio of category-tagged events to 
events satisfying taggability requirements) in bins of 
the magnitude of the negative vector sum of the trans- 
verse momenta of the charged tracks within the jet 
($t) [13' the scalar sum of transverse energies of j%, j2, 
and j3 (where applicable) Ht, Z[ji], and Z\j?\, where 
Z[j] = X^Pt'Vpt- We define one 4-dimensional matrix 
(Mtr) for each tagging category. The large data sample 
available allows improvement of this model by defining 
an event-based Mtr instead of a jet-based Mtr- The 
advantage is that correlations between the jets in each 
event are properly taken into account. We use the Mtr 
to predict the QCD MJ contribution in the preselection, 
which has the same flavor composition before tagging 
requirements as in the region from which the Mtr is de- 
rived. The QCD MJ background normalization in each 
tagging category is determined from the corresponding 
Mtr after subtracting the contributions from all other 
background sources, which are estimated using simulated 
events. The model is validated in various control regions, 
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FIG. 1: The distribution of NNqcd for events satisfying 
the preselection criteria. The signal region is defined by 
NNqcd > 0.45. The normalization of the QCD MJ contribu- 
tion is determined from the data. The uncertainty includes 
all statistical and systematic contributions (see text) 



defined below. 

We employ an artificial neural network, NNqcd, to 
further separate the dominant QCD MJ background 
from the signal and other backgrounds. We train a 
14-variable feed-forward multilayer perceptron bearing 
activity-derived (Et, angular (A<p($T,$r), angu- 

lar separations between $V-, and the jet directions), 
and event shape (centrality and sphericity [3l|) observ- 
ables (l9j |. Figure Q] shows the distribution of NNqcd 
in the preselection sample; the QCD MJ backgrounds 
(peaking at 0) are well separated from the signal (peak- 
ing at 1). We retain only events with NNqcd > 0.45 
(signal region), rejecting about 90% (70%) of the QCD 
MJ (overall) background while keeping 95% of the signal. 
This represents a 15% increase in background rejection 
for the same signal acceptance compared to Ref. [3]. 
The S/B in the signal region is about 1/60 in the SS 
tagging category for m# = 125 GeV/c 2 , similar to that 
of the corresponding tagging category of Ref. [32| . 

We employ a second network, NNgic, to discriminate 
the expected signal from the remaining backgrounds. 
Seven input variables are used for this purpose: the in- 
variant mass of the two leading jets (rn{j\, Ja))) the in- 
variant mass of JpT and all jets, the differences Ht — $t 
and BIt — Et (Mt is the magnitude of the negative vector 
sum of jet Et's), the maximum AR between the jets, the 
output of NNqcd , and the output of a NN using track- 
ing information to separate events with intrinsic $t from 
those with instrumental Et [33| ■ 

We avoid potential bias by testing our understanding 
of the SM backgrounds in several control samples where 
the expected amount of signal is negligible. We define 
an EWK region (Fig. Ufa)) by requiring events to have 
at least one charged lepton in addition to satisfying the 
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preselection criteria. This region is sensitive to top-quark 
pair, V+jets, and, to a lesser extent, Wand electroweak 
single-top-quark production, and is used to validate the 
simulation against the data. We also define the MJ1, 
MJ2, and MJ3 control regions, which contain no identi- 
fied lepton and are dominated by QCD processes. MJ1 
(Fig. Htb)) contains events with Atp($ T ,E^) < 0.4 and 
$t > 70 GeV. MJ2 contains events satisfying the pre- 
selection requirements and NNqcd < 0.1 and is the re- 
gion where the QCD MJ normalization is obtained from 
the data. MJ3, defined from preselection events with 
0.1 < NNqcd < 0.45, serves as a final consistency check 
of the overall normalization. Finally, we validate our 
background model in the preselection region before pro- 
ceeding with the final fit in the signal region. We check 
the distribution of multiple kinematic variables, includ- 
ing all inputs to NNqcd and to the final discriminant 
function NNsic, defined in the next paragraph, as well 
as the output of these two networks in all our control 
samples [19j. We obtain good agreement between the 
data and our SM background model in all the samples, 
with only the normalization of the QCD MJ component 
determined from the fit to data. 

The distribution of NNsic is validated in our control 
samples, as shown in Fig. [2] for events with two b tags. 
Figure [2fc) shows the distribution of NNsig in the signal 
region for events with two b tags. The expected number 
of events is compared to the observed yields in Table fl] 
For m,H — 125 GeV/c 2 , we expect a total of 21 (16) signal 
events with one (two) 6-tagged jets. 

We perform a binned likelihood fit to probe for a VH 
signal in the presence of SM backgrounds. The likelihood 
is the product of Poisson probabilities over the bins in 
the NNsic distribution. The mean number of expected 
events in each bin includes contributions from each back- 
ground source and from the VH processes (assuming a 
given value of mjy). We employ a Bayesian likelihood 
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TABLE I: Comparison of the number of expected and ob- 
served events in the signal region for different tagging cate- 
gories. The uncertainties include all statistical and systematic 
contributions (see text) [34 ] . 



Process 


2 b tags 


1 b tag 


SS + SJ 


IS 


VV 


62±7.5 


293±32 


Top 


370±52 


1015±128 


V + heavy flavor 


424±81 


3680±675 


EWK mistags 


55±26 


2288±283 


QCD MJ 


1300±31 


10825±177 


Total 


2211±197 


18100±1295 


Data 


2117 


18165 


Expected Higgs boson 


signal for mn 


= 125 GeV/c 
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7.6 
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WH ->■ Ivbb 
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FIG. 2: The distribution of the final discriminant function, 
NNsig, for events with two b tags (SS+SJ categories) in 
the control samples: (a) EWK, (b) MJ1, (c) signal region 
(NNqcd > 0.45). Only the normalization of the QCD MJ is 
fit to the data. 



method [35j with a flat, non-negative, prior probability 
for the SM Higgs boson production cross section times 
branching fraction, a(VH) x B(H — > bb), and truncated 
Gaussian priors for the uncertainties on the acceptance 
and shape of the backgrounds. We combine the three tag- 
ging categories by taking the product of their likelihoods 
and simultaneously varying the correlated uncertainties. 
All systematic uncertainties except those associated with 
the QCD MJ and the EWK mistags are treated as fully 
correlated across the tagging categories. 
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The uncertainties from the simulations statistics and 
those on the normalizations of top-quark (10%), dibo- 
son (6%), V+jets (30%), QCD MJ (1 to 3%), and EWK 
mistags (20 to 65%) production are not correlated. The 
shapes obtained by varying the Mj-r (mistag) probabil- 
ities by one standard deviation from their central val- 
ues are applied as shape uncertainties for the QCD MJ 
(EWK mistags). The correlated uncertainties, which ap- 
ply to both the signal and the EWK backgrounds, in- 
clude luminosity measurement (6%), 6-tagging efficiency 
(5 to 10%), trigger efficiency (3-5%), lepton veto effi- 
ciency (2%), parton distribution function (3%), and up 
to 11% for the jet-energy scale [Hj]. We also determine 
the shape uncertainties on NNsig due to the jet-energy 
scale and the trigger efficiency. The latter two also af- 
fect the QCD MJ background through the background 
subtraction procedure described above. Initial- and final- 
state radiation uncertainties (2 to 3%) are applied only 
to the VH signal. 

We compute 95% C.L. upper limits on 
a(VH) x B(H -> bb) for 90 < m H < 150 GeV/c 2 in 
5 GeV/c 2 steps using the methodology described in 
Ref. [36J. The expected and observed upper limits 
are shown in Table HH We test the consistency of the 
observed limits with the signal hypothesis by statistical 
sampling of the signal-plus-background model (assuming 
tuh = 125 GeV/c 2 ). These studies indicate that the 
median upper C.L. in the SM Higgs scenario is higher 
(up to 2.5 units in SM cross-section) than that of the 
background-only hypothesis over the 90 — 150 GeV/c 2 
range, and is consistent with the observed limits within 
one standard deviation. 

In summary, we have performed a direct search for 
the SM Higgs boson decaying into bb pairs using the full 
CDF II data sample, corresponding to 9.45 fb _1 of in- 
tegrated luminosity accumulated during Run II of the 
Tevatron. Improved techniques increase the sensitiv- 
ity by roughly 15% with respect to a previous analy- 
sis [14J in addition to the improvement due to larger 
integrated luminosity. We set 95% C.L. upper lim- 
its on a(VH) x B(H -> bb) for 90 < m H < 150 GeV/c 2 
with 5 GeV/c 2 increments. For a Higgs boson mass of 
125 GeV/c 2 , the observed limit is 6.7 times the SM pre- 
diction, consistent with the expected limit of 3.6 within 
two standard deviations. 
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FIG. 3: Observed and expected (median, for the background- 
only hypothesis) 95% C.L. upper limits on VH cross section 
times B(H — > bb) divided by the SM prediction, as a function 
of the Higgs boson mass. The bands indicate the 68% and 
95% credibility regions where the limits can fluctuate, in the 
absence of signal. 



TABLE II: Expected and observed 95% C.L. upper limits on 
the VH cross section times B{H — > bb) and their ratio to the 
SM prediction @]. 

rriH <Jvh x B(H — > bb) (pb) Ratio to SM prediction 

(GeV/c 2 ) Expected Observed Expected Observed 
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